The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Talking face generation aims at generating photo-realistic video portraits of a target person driven by input audio. Due to its nature of one-to-many mapping from the input audio to the output video (e.g., one speech content may have multiple feasible visual appearances), learning a deterministic mapping like previous works brings ambiguity during training, and thus causes inferior visual results. Although this one-to-many mapping could be alleviated in part by a two-stage framework (i.e., an audio-to-expression model followed by a neural-rendering model), it is still insufficient since the prediction is produced without enough information (e.g., emotions, wrinkles, etc.). In this paper, we propose MemFace to complement the missing information with an implicit memory and an explicit memory that follow the sense of the two stages respectively. More specifically, the implicit memory is employed in the audio-to-expression model to capture high-level semantics in the audio-expression shared space, while the explicit memory is employed in the neural-rendering model to help synthesize pixel-level details. Our experimental results show that our proposed MemFace surpasses all the state-of-the-art results across multiple scenarios consistently and significantly.
translated by 谷歌翻译
A critical challenge in multi-agent reinforcement learning(MARL) is for multiple agents to efficiently accomplish complex, long-horizon tasks. The agents often have difficulties in cooperating on common goals, dividing complex tasks, and planning through several stages to make progress. We propose to address these challenges by guiding agents with programs designed for parallelization, since programs as a representation contain rich structural and semantic information, and are widely used as abstractions for long-horizon tasks. Specifically, we introduce Efficient Multi-Agent Reinforcement Learning with Parallel Program Guidance(E-MAPP), a novel framework that leverages parallel programs to guide multiple agents to efficiently accomplish goals that require planning over $10+$ stages. E-MAPP integrates the structural information from a parallel program, promotes the cooperative behaviors grounded in program semantics, and improves the time efficiency via a task allocator. We conduct extensive experiments on a series of challenging, long-horizon cooperative tasks in the Overcooked environment. Results show that E-MAPP outperforms strong baselines in terms of the completion rate, time efficiency, and zero-shot generalization ability by a large margin.
translated by 谷歌翻译
The Shapley value (SV) is adopted in various scenarios in machine learning (ML), including data valuation, agent valuation, and feature attribution, as it satisfies their fairness requirements. However, as exact SVs are infeasible to compute in practice, SV estimates are approximated instead. This approximation step raises an important question: do the SV estimates preserve the fairness guarantees of exact SVs? We observe that the fairness guarantees of exact SVs are too restrictive for SV estimates. Thus, we generalise Shapley fairness to probably approximate Shapley fairness and propose fidelity score, a metric to measure the variation of SV estimates, that determines how probable the fairness guarantees hold. Our last theoretical contribution is a novel greedy active estimation (GAE) algorithm that will maximise the lowest fidelity score and achieve a better fairness guarantee than the de facto Monte-Carlo estimation. We empirically verify GAE outperforms several existing methods in guaranteeing fairness while remaining competitive in estimation accuracy in various ML scenarios using real-world datasets.
translated by 谷歌翻译
With the demand for standardized large-scale livestock farming and the development of artificial intelligence technology, a lot of research in area of animal face recognition were carried on pigs, cattle, sheep and other livestock. Face recognition consists of three sub-task: face detection, face normalizing and face identification. Most of animal face recognition study focuses on face detection and face identification. Animals are often uncooperative when taking photos, so the collected animal face images are often in arbitrary directions. The use of non-standard images may significantly reduce the performance of face recognition system. However, there is no study on normalizing of the animal face image with arbitrary directions. In this study, we developed a light-weight angle detection and region-based convolutional network (LAD-RCNN) containing a new rotation angle coding method that can detect the rotation angle and the location of animal face in one-stage. LAD-RCNN has a frame rate of 72.74 FPS (including all steps) on a single GeForce RTX 2080 Ti GPU. LAD-RCNN has been evaluated on multiple dataset including goat dataset and gaot infrared image. Evaluation result show that the AP of face detection was more than 95% and the deviation between the detected rotation angle and the ground-truth rotation angle were less than 0.036 (i.e. 6.48{\deg}) on all the test dataset. This shows that LAD-RCNN has excellent performance on livestock face and its direction detection, and therefore it is very suitable for livestock face detection and Normalizing. Code is available at https://github.com/SheepBreedingLab-HZAU/LAD-RCNN/
translated by 谷歌翻译
We propose an analysis in fair learning that preserves the utility of the data while reducing prediction disparities under the criteria of group sufficiency. We focus on the scenario where the data contains multiple or even many subgroups, each with limited number of samples. As a result, we present a principled method for learning a fair predictor for all subgroups via formulating it as a bilevel objective. Specifically, the subgroup specific predictors are learned in the lower-level through a small amount of data and the fair predictor. In the upper-level, the fair predictor is updated to be close to all subgroup specific predictors. We further prove that such a bilevel objective can effectively control the group sufficiency and generalization error. We evaluate the proposed framework on real-world datasets. Empirical evidence suggests the consistently improved fair predictions, as well as the comparable accuracy to the baselines.
translated by 谷歌翻译
Multi-view graph clustering (MGC) methods are increasingly being studied due to the explosion of multi-view data with graph structural information. The critical point of MGC is to better utilize the view-specific and view-common information in features and graphs of multiple views. However, existing works have an inherent limitation that they are unable to concurrently utilize the consensus graph information across multiple graphs and the view-specific feature information. To address this issue, we propose Variational Graph Generator for Multi-View Graph Clustering (VGMGC). Specifically, a novel variational graph generator is proposed to extract common information among multiple graphs. This generator infers a reliable variational consensus graph based on a priori assumption over multiple graphs. Then a simple yet effective graph encoder in conjunction with the multi-view clustering objective is presented to learn the desired graph embeddings for clustering, which embeds the inferred view-common graph and view-specific graphs together with features. Finally, theoretical results illustrate the rationality of VGMGC by analyzing the uncertainty of the inferred consensus graph with information bottleneck principle. Extensive experiments demonstrate the superior performance of our VGMGC over SOTAs.
translated by 谷歌翻译
阐明并准确预测分子的吸毒性和生物活性在药物设计和发现中起关键作用,并且仍然是一个开放的挑战。最近,图神经网络(GNN)在基于图的分子属性预测方面取得了显着进步。但是,当前基于图的深度学习方法忽略了分子的分层信息以及特征通道之间的关系。在这项研究中,我们提出了一个精心设计的分层信息图神经网络框架(称为hignn),用于通过利用分子图和化学合成的可见的无限元素片段来预测分子特性。此外,首先在Hignn体系结构中设计了一个插件功能的注意块,以适应消息传递阶段后自适应重新校准原子特征。广泛的实验表明,Hignn在许多具有挑战性的药物发现相关基准数据集上实现了最先进的预测性能。此外,我们设计了一种分子碎片的相似性机制,以全面研究Hignn模型在子图水平上的解释性,表明Hignn作为强大的深度学习工具可以帮助化学家和药剂师识别出设计更好分子的关键分子,以设计更好的分子,以设计出所需的更好分子。属性或功能。源代码可在https://github.com/idruglab/hignn上公开获得。
translated by 谷歌翻译
虽然先前以语音为导向的说话面部生成方法在改善合成视频的视觉质量和唇部同步质量方面取得了重大进展,但它们对唇部运动的关注较少,从而极大地破坏了说话面部视频的真实性。是什么导致运动烦恼,以及如何减轻问题?在本文中,我们基于最先进的管道对运动抖动问题进行系统分析,该管道使用3D面表示桥接输入音频和输出视频,并通过一系列有效的设计来改善运动稳定性。我们发现,几个问题可能会导致综合说话的面部视频中的烦恼:1)输入3D脸部表示的烦恼; 2)训练推导不匹配; 3)视频帧之间缺乏依赖建模。因此,我们提出了三种有效的解决方案来解决此问题:1)我们提出了一个基于高斯的自适应平滑模块,以使3D面部表征平滑以消除输入中的抖动; 2)我们在训练中对神经渲染器的输入数据增加了增强的侵蚀,以模拟推理中的变形以减少不匹配; 3)我们开发了一个音频融合的变压器生成器,以模拟视频帧之间的依赖性。此外,考虑到没有现成的指标来测量说话面部视频中的运动抖动,我们设计了一个客观的度量标准(运动稳定性指数,MSI),可以通过计算方差加速度的倒数来量化运动抖动。广泛的实验结果表明,我们方法对运动稳定的面部视频生成的优越性,其质量比以前的系统更好。
translated by 谷歌翻译
由于类间的相似性和注释歧义,嘈杂的标签面部表达识别(FER)比传统的嘈杂标签分类任务更具挑战性。最近的作品主要通过过滤大量损坏样本来解决此问题。在本文中,我们从新功能学习的角度探索了嘈杂的标签。我们发现,FER模型通过专注于可以认为与嘈杂标签相关的一部分来记住嘈杂的样本,而不是从导致潜在真理的整个功能中学习。受到的启发,我们提出了一种新颖的擦除注意力一致性(EAC)方法,以自动抑制嘈杂的样品。具体而言,我们首先利用面部图像的翻转语义一致性来设计不平衡的框架。然后,我们随机删除输入图像,并使用翻转注意一致性,以防止模型专注于部分特征。 EAC明显优于最先进的嘈杂标签方法,并将其概括地概括为其他类似CIFAR100和Tiny-Imagenet等类别的任务。该代码可在https://github.com/zyh-uaiaaaa/erasing-prestention-consistency中获得。
translated by 谷歌翻译